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Abstract. 

Quantum Chromodynamics (QCD) is the fundamental theory for the interaction between 
quarks and gluons. It manifests as the short-range strong interaction inside the nucleus, and 
plays an important role in the evolution of the early universe, from the quark-gluon phase to the 
hadron phase. To solve QCD is a grand challenge, since it requires very large-scale numerical 
simulations of the discretized action of QCD on the 4-dimensional space-time lattice. Moreover, 
since quarks are relativistic fermions, the 5-th dimension is introduced such that massless quarks 
with exact chiral symmetry can be realized at finite lattice spacing, on the boundaries of the 5-th 
dimension, the so-called domain- wall fermion (DWF). In this talk, we discuss how to simulate 
lattice QCD with DWF such that the chiral symmetry can be preserved optimally with a finite 
extent in the 5-th dimension, and to outline the simulations which have been performing by the 
TWQCD Collaboration, and also to present some recent physical results. 



1. Introduction 

Quantum Chromodynamics (QCD) is the fundamental theory for the interaction between quarks 
and gluons. It provides the theoretical framework to understand the nuclear force/energy from 
the first principles. Morever, QCD plays an important role in the evolution of the early universe, 
from the quark-gluon phase to the hadron phase. Since quarks are relativistic fermions, they 
possess the chiral symmetry in the massless limit. The chiral symmetry forbids the additive 
mass renormalization which causes the fine-tuning problem associated with the scalar field. 
In QCD, the chiral symmetry [SUL{Nf) x SUji{Nf)] of Nj massless quarks is spontaneously 
broken to SUv{Nf), due to the strong interaction between quarks and gluons. This gives the 
(nearly) massless Goldstone bosons (pions) and their specific interactions. To investigate the 
spontaneously chiral symmetry breaking as well as the hadron physics from the first principles of 
QCD, it requires nonperturbative methods. So far, lattice QCD is the most promising approach, 
by discretizing the continuum space-time on a 4-dimensional lattice [Ij, and to compute physical 
observables by Monte Carlo simulations [2j. However, in lattice QCD, formulating lattice fermion 
with exact chiral symmetry at finite lattice spacing is rather nontrivial, which is realized by the 
domain- wall fermion (DWF) on the (4-|-l)-dimensional lattice [3J, and the overlap fermion on 
the 4-dimensional lattice [1]. 

Lattice QCD with exact chiral symmetry O H] is an ideal theoretical framework to study 
nonperturbative physics from the first principles of QCD. However, it is rather nontrivial to 



perform Monte Carlo simulation such that the chiral symmetry is preserved at a high precision 
and all topological sectors are sampled ergodically. 

Currently, there are three groups (RBC/UKQCD, JLQCD, TWQCD) around the world 
performing large-scale dynamical simulations of lattice QCD with exact chiral symmetry. Since 
the computational requirement for these dynamical simulations is 10 — 100 times of their 
counterparts using traditional lattice fermions (e.g., Wilson fermion, staggered fermion, and their 
variants), they are often performed with the state-of-the-art architectures. While RBC/UKQCD 
and JLQCD have been using IBM Blue Gene supercomputers, TWQCD has been using a GPU 
cluster since 2009 (currently consisting of 320 Nvidia CPUs, with sustained 100 Tflops/s). 

The RBC/UKQCD Collaborations have been using the conventional domain- wall fermion 
with the Shamir kernel [3 [B], which suffers from large chiral symmetry breaking (i.e., large 
residual mass), especially in the finite temperature QCD. On the other hand, the JLQCD 
Collaboration used the overlap fermion in a fixed topology which attains very good chiral 
symmetry but in the expense of sampling all topological sectors. To overcome the deficiencies of 
above two approaches, TWQCD Collaboration has been using the optimal domain-wall fermion 
(ODWF) [8l[9] to preserve the chiral symmetry, which not only attains a good chiral symmetry 
with a modest extension (e.g., Ng = 16) in the fifth dimension, but also samples all topological 
sectors ergodically. Recently, JLQCD Collaboration has started a new project [lOj, to use 
the conventional domian-wall fermion with the scaled Shamir kernel, performing dynamical 
simulations with more than 6 racks of IBM Blue Gene/Q supercomputer. In other words, now 
all 3 groups (RBC/UKQCD, JLQCD, TWQCD) are using domain- wall fermions (DWF) to 
perform large-scale dynamical simulations of lattice QCD. 

In this talk, we discuss how to simulate lattice QCD with DWF such that the chiral symmetry 
can be preserved optimally with a finite extent in the 5-th dimension, and to outline the 
simulations which have been performing by the TWQCD Collaboration, and also to present 
some recent physical results. 

Mathematically, optimal domain-wall fermion (ODWF) is a theoretical framework to preserve 
the chiral symmetry maximally with a set of analytical weights, {ujs,s = I,-- - ,Ns}, one for 
each layer in the fifth dimension [.8j. Thus the artifacts due to the chiral symmetry breaking 
with finite Ng can be reduced to the minimum, especially in the chiral regime. In general, the 
4-dimensional effective Dirac operator of massless ODWF can be written as [11] 

l + lls=l^« (1) 

^- = T^^' H = cH^iy^d^M'^. r= [2mo(l-dmo)]-\ 

where c and d are constants, and = Ts-C^,, with the usual Wilson-Dirac operator plus 
a negative parameter —mo (0 < mo < 2). Here SoptiH) = HRz{H), where Rz{H) is the 
Zolotarev optimal rational approximation of (i7^)~^/^ [T2|. 

Recently we have demonstrated that it is feasible to perform a large-scale dynamical QCD 
simulation with ODWF, which not only preserves the chiral symmetry to a good precision, but 
also samples all topological sectors ergodically [T3]. To recap, we perform HMC simulations of 
2 flavors QCD on a 16'^ x 32 lattice, with ODWF at Ng = 16 and plaquette gauge action 
at /3 = 5.95. Then we compute the low-lying eigenmodes of the overlap Dirac operator, 
and use its index to obtain the topological charge of each gauge configuration, and from 
which we compute the topological susceptibility for 8 sea-quark masses. Our result of the 
topological susceptibility agrees with the sea-quark mass dependence predicted by the NLO 
ChPT [T3], and provides the first determination of both the pion decay constant (Ft^ = 
92(12)(2) MeV) and the chiral condensate (S^(2 GeV) = [259(6)(7) MeV]^) simultaneously 



from the topological susceptibility. Furthermore, our recent results of the mass and the 
decay constant of the pseudoscalar meson [15] are also in good agreement with the sea- 
quark mass dependence predicted by NLO ChPT |16j . and from which we obtain the low- 
energy constants F, S, I3 and I4. With the low-energy constants, we determine the average 
up and down quark mass {Tn^f{2 GeV) = 4. 17(13) (19) MeV), and the chiral condensate 

(i;^'^(2 GeV) = [230(4)(6) MeV]^). Our results of the topological susceptibility together with 
the mass and decay constant of the pseudoscalar meson assert that the nonperturbative chiral 
dynamics of the sea-quarks are well under control in our HMC simulations. 

Recently we have extended our simulations to two sets of larger lattices: (i) 20^ x 40 x 16 with 
plaquette gauge action at /3 = 5.95, for 6 sea-quark masses corresponding to pion masses in the 
range 230-450 MeV; (ii) 24^ x 48 x 16 with plaquette gauge action at /3 = 5.95, for 4 sea-quark 
masses corresponding to pion masses in the range 230-450 MeV. Now, after simulating 2-flavors 
QCD on various lattices (16^ x 32, 20^ x 40, 24^ x 48), we are ready to perform dynamical 
simulations of (2 -|- l)-flavors and (2 + 1 + l)-flavors QCD on the 32'^ x 64 x 16 lattice, with pion 
mass close to the physical value. Our strategy will be outlined in the final section. 



2. Theoretical Aspects of Domain- Wall Fermions 

In general, for a given A^^ (the number of sites in the fifth dimension), the (mathematically) 
maximal chiral symmetry can be attained by the optimal domain-wall fermion (ODWF) [8j with 
the operator 



l)x'X'' ^ss' ) 



(2) 



where ps = ccog + d, as = ccog — d, and c, d are constants. The indices x and x' denote the sites 
on the 4-dimensional space-time lattice, while s and s' the layers in the fifth dimension. Here 
is the standard Wilson Dirac operator plus a negative parameter —mo (0 < mo < 2), 



{D^)xx' = -\Y1 ~ lti)Ut,{x)6^+f,,^' + (1 + 7^)C/^(x')-5x-/i,x' + (4 



moj 



(3) 



where Ufj_{x) denotes the link variable pointing from x to x + ft, 



L = P+L+ + 



P± = (1 ± 75)/2, 



and 



s = 1, r = l/[2mo(l — dniQ)] 
1 <s< N, 



(4) 



(5) 



The weights {ujs} along the fifth dimension are fixed according to the formula derived in [8] 
such that the maximal chiral symmetry is attained. In general, for other DWFs without 
maximal chiral symmetry, the weights {ps} and {dg} have different values. For example, for 
the conventional (Shamir) DWF which has been used by the RBC/UKQCD Collaborations, 
c = d = 1/2, and uig = 1, Vs; and for the scaled Shamir DWF (with scaling factor = 2) which is 
being used by the JLQCD Collaboration, c = 1, d = 1/2, and cOg = 1, Vs. 

The breaking of chiral symmetry due to finite Ng in the fifth dimension can be measured 
by the residual mass emerging in the axial Ward identity on the lattice. For the general DWF 
operator ([2]), the axial Ward identity is derived in Ref. [TT], and a new formula for the residual 
mass is also obtained, 



nir 



tic{Dc + mg)Q I 



tT:[{Dl+mg){Dc + mg)]Ql^ 



m, 



9' 



(6) 



{U} 



where {Dc + Tng)~^ denotes the valence quark propagator with niq equal to the sea-quark mass, 
tr denotes the trace running over the color and Dirac indices, and the subscript {U} denotes 
averaging over an ensemble of gauge configurations. Formula ([6|) is useful in practice, since it 
immediately gives the residual mass once the 12 columns of the quark propagator are computed. 
Moreover, an upper-bound for the residual mass of ODWF is derived in Ref. [11]. It asserts that 
for Ng less than some threshold value (~ 16 — 18), the residual mass is an exponentially decay 
function of A^^. This implies that ODWF can provide a viable way to preserve chiral symmetry 
to a good precision (e.g., mresO ~ 10~^) with a modest Ng (e.g., Ng ~ 16). 



3. HMC Simulation of Lattice QCD with Domain- Wall Quarks 

First of all, we perform the even-odd preconditioning on the DWF operator ([2]), which is essential 
for lowering the condition number as well as halving the memory consumption. Since Z?^ 
commutes with {p)ss' = Ps^ss' and {a)ss' = CTgdss', Eq. ([2]) becomes 

P(m,) = D^{p + aL) + (1 - L) = D^[cw{l + L) + d{l - L)] + {1 - L) (7) 

where {uj)s,s' = ^sSs,s' is a diagonal matrix in the 5-th dimension. Now, separating the even and 
the odd sites on the 4D space-time lattice, Eq. ([7]) can be written as 

^K)=(^OE° 4 J Ml + L) + d(l - L)] + (1 - L) = (^^^Ey X 
where 

X={A-mo)[cuj{l + L)+d{l-L)] + {l-L), Y = cuj{l + L) + d{l - L). (9) 
Now we further rewrite it in a more symmetric form by defining 

Ms = w-i/2yX-ia;i/2 = |(4 _ ^q) + ^-1/2 [^(1 + L)(l - L)-^ + duj-^] ~^ uj^^^^Y^ , (10) 

and 

S, ^ co-'/^YX-' = Nhuj-^l\ S2 ^ Y-'co'/\ (11) 

Then Eq. ^ becomes 

V(m)-S-'( ^ M,D^°\ ,( 1 0\(1 0\(1 M,D^^\ , 

(12) 

where the Schur decomposition has been used in the last equality, with the Schur complement 

C = l- M^D^'^M^Dl''. (13) 

Note that the quark mass dependence (m = rniq) only resides in M5 through L. 

Since detP = det5]~^ • detC • det5^^, and Si and S2 do not depend on the gauge field, 
we can just use C in the Monte Carlo simulation. After including the Pauli-Villars fields (with 
niq = 1/r), the pseudofermion action for 2 fiavors QCD (in the isospin limit niu = rud) can be 
written as 

Spf = 4>^Cl{CC^)-'Ci4>, Ci = C{rmq = l). (14) 

where (j) and (f)^ are complex scalar fields carrying the same quantum numbers (color, spin) of 
the quark fields. Including the gluon fields, the partition function for 2 fiavors QCD can be 
written as 

Z= f [dU][d(l)^][d(l)] exp (^-Sg[U] - (p^CliCC^y^Ci^^ , (15) 



where Sg[U] is the lattice action for the gauge field, e.g., the Wilson plaquette action 



S,[U]=f3Y,\^-l^^^'iUp)\, /3 = 4- (16) 

plaq. ^ J 9 

It is rather difficult to simulate (jlSp directly with the Metropolis algorithm. The conventional 
wisdom to handle this problem is to introduce a fictituous dynamics to guide the Monte Carlo 
simulation, i.e., the Hybrid Monte Carlo (HMC) [Hj- Since the pseudofermion action ()14p is 
positive-definite, (p can be generated by the heat-bath method with Gaussian noise rj satisfying 
the Gaussian distribution exp(— ry^ry). That is, to solve the following equation with the conjugate 
gradient algorithm 

Ci^ = Cr]^ ClCi<p = ClCri. (17) 

Then the fictituous molecular dynamics only involves the gauge fields {Ai} and their conjugate 
momenta {Pi}, where Ai = Aft'^ is the matrix- valued gauge field corresponding to the link 
variable Ui = exp(iAft'^). The Hamiltonian of the molecular dynamics is 

'^ = 1 E(^")' + SsiU] + cp^CliCC^r'C,^, (18) 

I, a 

and the partition function can be written as 

Z = J[dU][dP][d(P][d(l)^]exp{-n). (19) 
The Hamilton equations for the fictituous molecular dynamics are 

"^tir). m _p.,,,^^Ml)=,P,Mt,,M. (20) 



dT dP^{T) ' ' ' dT 

dPf{T) _ dn _ dSg dSpf 



(21) 



dT dAf{T) dAf{T) dAf{T) 

These two equations together imply that dTi/dT = 0, which gives 

as an alternative form of ()2ip . 

The algorithm of HMC simulation can be outlined as follows: 

(i) Choose an initial gauge configuration {Ui}. 

(ii) Generate P^ with Gaussian weight exp({P;'^}^/2). 

(iii) Generate rj with Gaussian weight exp(— 77^77). 

(iv) Compute </> according to (fT7|) . 

(v) With {(f)} held fixed, integrate (j20p and (j2ip by an algorithm (leapfrog/Omelyan) which 
ensures exact reversibility and area-preserving map in the phase space for any 5t. 

(vi) Accept the new configuration {U[} generated by the molecular dynamics with probability 
mm(l, e"^"^), where ATI = n{U[, P/) - TiiU, P). This completes one HMC trajectory. 

(vii) For the next trajectory, go to (ii). 



To summarize, we first generate random noise vector rj with Gaussian distribution, exp(— ry^ry), 
then we obtain (p = Ci^Cr] using the conjugate gradient (CG). With fixed (f), the system is 
evolved under a fictituous Hamiltonian dynamics, the so-called molecular dynamics (MD). In the 
MD, we use the Omelyan integrator [18j, and the Sexton- Weingarten multiple-time scale method 
|19j . The most time-consuming part in the MD is to compute the fermion force —dSpf/dAf{T) 
which is required for updating the conjugate momenta in Eq. ()2ip . since it involves solving the 
linear system {CC^)v = Cicj) with CG. Here we take advantage of the remarkable floating-point 
capability of the Nvidia GPU, and perform the CG with mixed precision [2D]. Moreover, the 
computations of the gauge force and the update of the gauge field are also performed in CPUs. 
In other words, the entire HMC trajectory is computed by GPUs. Furthermore, we introduce an 
auxiliary heavy fermion field with mass rriH {rriq <^ mn < 1/r) similar to the case of the Wilson 
fermion [21| . For two flavors QCD, the pseudofermion action (with Ch = C{mH)) becomes, 



-1 



which gives exactly the same fermion determinant of ()14p . Nevertheless, the presence of the 
heavy fermion field plays a crucial role in reducing the light fermion force and its fluctuation, thus 
diminishes the change of the Hamiltonian in the MD trajactory, and enhances the acceptance 
rate. 

In two flavors QCD, only the quantum fluctuations (internal fermion loops) of the lightest u 
and d quarks (in the isospin limit rriu = ma) are incorporated, while those of the heavier quarks 
(s, c, b and t) are neglected. To incorporate the dynamical s quark, one must use a pseudofermion 
action different from that of two flavors QCD with degenerate masses (|14|) . A straightforward 
approach is to take the inverse square root of the quark matrix (and the quarter-root of the 
Pauli-Villars matrix) of the two flavors QCD, namely 



Nf=l 



0^(C7|C7i)i/4(cct)-i/2(C7tc^)i/4^_ 



However, the inverse square root and the quarter-root operations cannot be evaluated exactly 
since the cost of solving the eigenproblem of CC^ or CjCi is prohibitive. A way to resolve this 
problem is to use the optimal rational approximations for (CC^^)~^/^ and (CjCi)^/^ respectively, 
similar to the rational hybrid Monte Carlo (RHMC) |22j . 

Recently, Chen and Chiu have constructed a novel DWF action for one flavor QCD [23j, 
which is exact, without taking the square root, and amenable to HMC simulation. This novel 
pseudofermion action for one flavor QCD can be written as 



Nj=l 
^Pf 







/ — C-V_UJ 



Him) 



I + c+vluj-^/^ 



H(l)-A.(m)P, 



+ 







(23) 



where (pi and (/>2 are complex scalar fields on the 4-dimensional space-time lattice, with color 
and 2-spinor indices, and 



= 75^5 [D^ + M+P+ + Af_P_] , {R5)s,s' = Ss',N,+l-s, 

A±(m) = i?5[M±(l)-M±(m)]. 



n -1^-1/2 



The coefficients c± and the vectors v± in ()23p only depend on the parameters c, d, and 
{uJs,s = ,Ns}, and they will be shown explicitly in Ref. [23j. A detailed description 

of our HMC simulations will be presented in a forthcoming paper. 



4. Gauge Ensembles 

For zero temperature QCD, we have been working on the following ensembles of two flavors 
QCD with ODWF: 

• 16^ X 32 (a ~ 0.1 fm), with plaquette gauge action at /? = 5.95, for 8 sea-quark masses 
corresponding to pion masses in the range 230-560 MeV |13[ I15|. 

• 20^ X 40 (a ~ 0.1 fm), with plaquette gauge action at /? = 5.95, for 6 sea-quark masses 
corresponding to pion masses in the range 230-450 MeV |24) . 

• 24^ X 48 (a ~ 0.1 fm) with plaquette gauge action at /3 = 5.95, for 4 sea-quark masses 
corresponding to pion masses in the range 230-450 MeV. 

The first two lattice sizes (16^ x 32, and 20^ x 40) have been completed, while the third lattice 
size (24'^ X 48) is expected to be completed in the summer 2013. 

For finite temperature QCD, we have been working on the following ensembles of two flavors 
QCD with ODWF: 

• 16^ X 6, with plaquette gauge action, for 30-40 /3 values in the range [5.00, 5.95], each of 3-4 
sea-quark masses. 

• 24^ X 8, with plaquette gauge action, for 30-40 /3 values in the range [5.00, 5.95], each of 3-4 
sea-quark masses. 

The first lattice size (16^ x 6) has been completed, while the lattice size (24'^ x 8) is still in 
progress. 

For the quark part, we use ODWF with c = 1, d = (i.e., H = H^)-, Ng = 16, and 
^min/^max = (0.01, 0.02, 0.05)/6.2, where different values of Xmin have been used for different 
gauge ensembles. In general, for each sea-quark mass, we generate the initial 400-500 trajectories 
on a single GPU. After discarding 200-300 trajectories for thermalization, we sample one 
configuration every 5 trajectories, resulting 20-32 "seed" configurations for each sea-quark mass. 
Then we use these seed configurations as the initial configurations for independent simulations 
on 20-32 CPUs. Each GPU generates 200-250 trajectories independently. Then we accumulate 
total 5000 trajectories for each sea-quark mass. From the saturation of the binning error of the 
plaquette, as well as the evolution of the topological charge, we estimate the autocorrelation 
time to be around 10 trajectories. Thus we sample one configuration every 10 trajectories, and 
obtain 500 configurations for each sea-quark mass. 

For each configuration, we calculate the zero modes plus 80-180 conjugate pairs of the lowest- 
lying eigenmodes of the overlap Dirac operator. We outline our procedures as follows. First, 
we project 250 low- lying eigenmodes of H^, using adaptive thick restart Lanczos algorithm (a- 
TRLan), where each eigenmode has a residual less than 10~^^. Then we approximate the sign 
function of the overlap operator by the Zolotarev optimal rational approximation with 64 poles, 
where the coefficients are fixed with A^^^, = (6.4)^, and A^^^ equal to the maximum of the 250 
projected eigenvalues of H^. Then the sign function error is less than 10~^^. Using the 250 
low-modes of and the Zolotarev approximation with 64 poles, we use the a-TRLan algorithm 
again to project the zero modes plus 80-180 conjugate pairs of the lowest-lying eigenmodes of the 
overlap operator, where each eigenmode has a residual less than 10^^^. We store all projected 
eigenmodes for the later use. 

In Figs. [UlJl we present the essential characteristics of our HMC simulations, which are quite 
universal for any lattice sizes and sea-quark masses. In Fig. [1] (a), we plot the topological 
charge versus the trajectory in the HMC simulation of two fiavors QCD with ODWF. The 
lattice is 16^ x 32 x 16 with the spatial box ~ (2 fm)^, and the sea-quark mass corresponding 
to ~ 360 MeV. All these HMC trajectories are computed in one GPU, with a total of 
543 trajectories, and the inital 200 trajectories are discarded for thermalization. We see that 
the topological charge has a short auto-correlation time, and each topological sector is sampled 
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Figure 1. (a) The topological charge versus the trajectory in the HMC simulation of two flavors 
QCD with ODWF. The lattice is 16^ x 32 x 16 with the spatial box ~ (2 fm)^, and the sea-quark 
mass corresponding to ~ 360 MeV. The line connecting the data points is only for guiding 
the eyes, (b) The histogram of the topological charge distribution in (a). 
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Figure 2. (a) The AT-L versus the trajectory in the HMC simulation of two flavors QCD with 
ODWF. The lattice is 16^ x 32 x 16 with the spatial box ~ (2 fm)'^, and the sea-quark mass 
corresponding to M^^ ~ 360 MeV. The line connecting the data points is only for guiding the 
eyes, (b) The maximum forces of the gauge field, heavy fermion field, and light fermion field. 



ergodically. Moreover, the topological charge distribution behaves like a Gaussian distribution 
with the mean close to zero, as shown in Fig. [1] (b). In Fig. [2] (a), we plot the change of 
Hamiltonian of each trajectory, which is quite smooth, without any spikes in all trajectories. 
Using the measured value of (AT-l) = 0.00167(240), we estimate the theoretical acceptance 

rate Phmc = erfc {A'H)/2^ = 0.977(10), which agrees with the measured acceptance rate 

0.983(7). Moreover, the measured value of (exp(— A?^)) = 0.9993(23), in good agreement with 
the condition {exp(—A'H)) = 1 which follows from the area-preserving property of the HMC 
simulation. In Fig. [2] (b), we plot the maximum force (averaged over all links) among all 
momentum updates in a trajectory, for the gauge field, the heavy fermion field, and the light 
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Figure 3. (a) The time-correlation function of the pseudoscalar meson for six sea-quark masses, 
(b) The effective mass of (a). The dashed line connecting the data points of the same sea-quark 
mass is for guiding the eyes. 



fermion field respectively. The forces all behave smoothly for all trajectories. 
5. Some recent results 

To verify whether a dynamical simulation of lattice QCD captures the nonperturbative chiral 
dynamics of the sea-quarks, a prerequisite is to examine to what extent the pion mass (M^) 
and decay constant (Fj^) agree with the sea-quark mass dependence as predicted by the next-to- 
leading-order chiral perturbation theory (NLO ChPT) [16j, and to check whether the resulting 
low-energy constants are reasonable or not. Furthermore, to check whether a dynamical 
simulation of lattice QCD samples all topological sectors ergodically, it is necessary to measure 
the topological susceptibility versus the sea-quark mass, and to compare the results with the 
sea-quark mass dependence as predicted by the NLO ChPT |14) . We have performed these tests 
for the gauge ensembles on the 16^ x 32 x 16 lattice [T3l [T5]. 

In this section, we present preliminary results of these tests for the gauge ensembles on the 
20^ X 40 X 16 lattice [21]. 

Using the valence quark propgator with quark mass equal to the sea-quark mass, we compute 
the time-correlation function of the pseudoscalar interpolator 



where the trace runs over the Dirac and color space. Then the ensemble average (C7r(t)) of each 
niq is fitted to the formula Z[e~^^'""* + e~^'^^"'^'^~''^^]/ (IM-j^a) to extract the pion mass Mj^a and 
the decay constant Fj^a = mqa^/2Z / {M.,^aY . 

In Fig. m we plot {Mt^o)"^ / {mqo) and F^a versus m^a respectively. Here we have made 
the correction for the finite volume effect using the estimate within ChPT calculated up to 
0{M^/{At^Ft^)'^) pS]. Taking into account of the correlation between M'^/niq and for the 
same sea-quark mass, we fit our data to the formulas of NLO ChPT, 
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Figure 4. Pseudoscalar meson of 2 flavors QCD with ODWF (a) {Mt^o)^ / {mqo), and (b) F^a. 
The soHd Hnes are the simultaneous fits to the NLO ChPT, for six sea-quark masses. 



where A3 and A4 are related to the low energy constants and /4 as follows. 

[3 = In , l4 = ln (-^) , m^± = 0.140 GeV. 

Our procedure of data fitting to extract the parameters (E, F, A3 and A4) has been outlined 
in Ref. [15]. For six sea-quark masses, our fit gives 

Sa^ = 0.00122(6) (2), Fa = 0.0414(10)(16), , . 

/3 = 3.829(105)(43), /4 = 4.755(93)(23), ^ ' 

where the systematic errors are estimated by varying the number of data points from 6 to 4 
{rriga < 0.04 ). 

With the fitted parameters, we use the physical ratio 

'MA"^''" 0.135 GeV , 

1.45 



J 0.093 GeV 

as the input, and solve the equation M.,^{mq)/ F.,^{mq) = 1.45 to obtain the physical bare 
quark mass rn^^'^a = 0.00254(10)(16). Prom ()25p and the physical pion decay constant 
= 92.6 MeV, we determine the inverse lattice spacing at the physical point, 

1/a = 2.076(6)(5) GeV. 

From (|24p . we obtain the pion mass at the physical point, M,r = 0.134(5) (3) GeV, which serves 
consistency check. 

In order to convert the chiral condensate S and the average niu and rud to those in the 
MS scheme, we calculate the renormalization factor Z^^(2 GeV) using the non-perturbative 
renormalization technique through the RI/MOM scheme [26], which gives Z^^(2 GeV) = 
1.244(18)(39). Then the values of S and the average of rriu and are transcribed to 

E^(2 GeV) = [238(10)(6) MeV]^ (27) 
m^(2 GeV) = 4.07(13) (12) MeV. (28) 



Our results of the chiral condensate ()27p and the average up and down quark mass ()28p are in 
good agreement with our previous ones on the 16^ x 32 lattice flSj. Since our calculation is done 
at a single lattice spacing the discretization error cannot be quantified reliably, but we do not 
expect much larger error because our lattice action is free from 0(a) discretization effects. 





Figure 5. Histogram of topological charge distribution for six sea-quark masses (preliminary 
results with ~ 200 configurations for each ensemble). 



For the projection of low-lying eigenmodes (zero modes plus 180 pairs of lowest-lying 
eigenmodes) of the overlap operator for each configuration, we have completed about half of 
the total (500 x 6 = 3000) configurations. In Fig. \5\ we plot the histogram of topological charge 
distribution for m^a = 0.01, 0.02, •• • ,0.06 respectively. Evidently, the probability distribution 
of Qt for each sea-quark mass behaves like a Gaussian distribution, and it becomes more sharply 
peaked around = as the sea-quark mass nig gets smaller. We will measure the topological 
susceptibility and related physical quantities after the projections of all 3000 configurations are 
completed. 

6. Conclusion and Outlook 

Chiral symmetry plays an important role in particle physics. It is one of the most salient features 
of the massless fermion field, a consequence of the principles of quantum mechanics and special 
relativity. Thus it is vital to preserve the chiral symmetry at finite lattice spacing. Theoretically, 
this is realized by the domain-wall fermion with infinite extent {Ng — )• oo) in the fifth dimension, 
or the overlap fermion in the four dimension. However, in practice, it is a rather challenging 
problem to perform dynamical simulations of lattice QCD with overlap or domain-wall fermion 
such that the chiral symmetry is preserved at a high precision and all topological sectors are 
sampled ergodically. Naively, one might expect that a high precision of chiral symmetry can be 



attained with a sufficiently large Ng. However, the relevant question is how the chiral symmetry 
violation (e.g., the residual mass) decreases with respect to Ns. 

In this talk, it has been shown that lattice QCD with optimal domain-wall fermion (ODWF) 
provides a viable approach to perform large-scale simulations of unquenched QCD, which not 
only preserves the chiral symmetry to a good precision, but also samples all topological sectors 
ergodically. The upper-bound of the residual mass for lattice QCD with ODWF [11] asserts that 
for Ng less than some threshold value (~ 16 — 18), the residual mass decays exponentially with 
respect to Ng. Thus, for lattice QCD with ODWF, chiral symmetry can be preserved to a good 
precision {rriresO' ~ 10~^) with a modest Ns ~ 16. On the other hand, for lattice QCD with 
the conventional DWF, the residual mass behaves like N~°' (a ~ 1 — 2) for any Ng J27j. Thus 
it is difficult for the conventional DWF to attain niresCL ~ 10~^, even for Ng ~ 32. Another 
approach to avoid large chiral symmetry violations is to use smeared links rather than thin links, 
since the number of low-lying eigenmodes of H = ci7^(l -|- d'y^Hw)~^ would be largely reduced 
for smeared links. However, it is unclear to what extent the short-distance physics would be 
affected even if the smeared links are only used in the lattice fermion operator. 

Finally, we briefly outline the new project of TWQCD Collaboration. Now, after simulating 
2-flavors QCD on various lattices (163x32, 20^x40, 24^x48), we are ready to perform dynamical 
simulations of (2 -|- l)-flavors and (2 -|- 1 -|- l)-flavors QCD on the 32^ x 64 lattice, with pion 
mass close to the physical value, and residual mass rriresO ~ 10~^. We outline our strategy as 
follows. In order to compute the fermion force (by conjugate gradient) for lattice QCD with 
DWF on the 32^ x 64 x 16 lattice, it requires at least 11 GB RAM, exceeding the maximum 
memory (6 GB) currently available in a single GPU (Nvidia C2070/K20x). In other words, we 
must use multiCPUs to meet the memory requirement, as well as to speed up the computation. 
Recently, we have developed efficient CUDA codes for the computation of entire HMC trajectory 
with multiGPUs. In Tabled! we summarize our benchmark for various Nvidia GPUs ^28j. This 
suggests that it is feasible to perform dynamical simulations of (2 -|- l)-flavors and (2 -|- 1 -|- 1)- 
flavors QCD on the 32^ x 64 x 16 lattice, with a GPU cluster of Nvidia K20/C2075/GTX680. 
Details will be presented in a forthcoming paper. 

Table 1. Benchmark of Nvidia GPUs, using HMC simulation of lattice QCD with ODWF on 
the 32'^ X 48 X 16 lattice. All numbers are in units of Gflops/sec. 



2*C2070 


2*K20c 


4*GTX680 


340 


535 


945 
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